The dataset came from FiveThirtyEight, and can be found here https://github.com/fivethirtyeight/guns-data. The dataset is stored in the guns.csv file. It contains information on gun deaths in the US from 2012 to 2014. Each row in the dataset represents a single fatality. The columns contain demographic and other information about the victim. Here are the first few rows of the dataset
In [25]:
import csv
data = list(csv.reader(open('guns.csv', 'r')))
print(data[:5])
In [26]:
#removing header row
headers = data[:1]
data = data[1:]
print(data[:5])
In [27]:
#count in the dictionary of how many times each element occurs in the year column
years = [each[1] for each in data]
years
year_counts = {}
for each in years:
if each in year_counts:
year_counts[each] += 1
else:
year_counts[each] = 1
print(year_counts)
In [28]:
#Let's see if gun deaths in the US change by month and year
import datetime
dates = [datetime.datetime(year=int(each[1]), month=int(each[2]), day=1) for each in data]
date_counts = {}
for each in dates:
if each in date_counts:
date_counts[each] += 1
else:
date_counts[each] = 1
dates[:5]
Out[28]:
The sex and race columns contain potentially interesting information on how gun deaths in the US vary by gender and race. Exploring both of these columns can be done with a similar dictionary counting technique to what we did earlier.
In [29]:
sex_counts = {}
race_counts = {}
for each in data:
sex = each[5]
if sex in sex_counts:
sex_counts[sex] += 1
else:
sex_counts[sex] = 1
for each in data:
race = each[7]
if race in race_counts:
race_counts[race] += 1
else:
race_counts[race] = 1
print(race_counts)
print(sex_counts)
However, our analysis only gives us the total number of gun deaths by race in the US. Unless we know the proportion of each race in the US, we won't be able to meaningfully compare those numbers. I want to get is a rate of gun deaths per 100000 people of each race
In [30]:
f = open ('census.csv', 'r')
census = list(csv.reader(f))
census
Out[30]:
In [31]:
mapping = {
'Asian/Pacific Islander': int(census[1][14]) + int(census[1][15]),
'Black': int(census[1][12]),
'Native American/Native Alaskan': int(census[1][13]),
'Hispanic': int(census[1][11]),
'White': int(census[1][10])
}
race_per_hundredk = {}
for key, value in race_counts.items():
result = race_counts[key] / mapping[key] * 100000
race_per_hundredk[key] = result
race_per_hundredk
Out[31]:
In [32]:
#We can filter our results, and restrict them to the Homicide intent
intents = [each[3] for each in data]
races = [each[7] for each in data]
homicide_race_counts = {}
for i, each in enumerate(races):
if intents[i] == 'Homicide':
if each not in homicide_race_counts:
homicide_race_counts[each] = 0
else:
homicide_race_counts[each] += 1
homicide_race_counts
Out[32]:
In [33]:
homicide_race_per_hundredk = {}
for key, value in homicide_race_counts.items():
result = homicide_race_counts[key] / mapping[key] * 100000
homicide_race_per_hundredk[key] = result
homicide_race_per_hundredk
Out[33]:
In [34]:
month_homicide_rate = {}
months = [int(each[2]) for each in data]
for i, each in enumerate(months):
if intents[i] == 'Homicide':
if each not in month_homicide_rate:
month_homicide_rate[each] = 0
else:
month_homicide_rate[each] += 1
month_homicide_rate
Out[34]:
In [67]:
def months_diff(input_dict):
max_value = 0
max_key = 0
min_value = input_dict[1]
min_key = 0
for key, value in input_dict.items():
if value > max_value:
max_value = value
max_key = key
if value < min_value:
min_value = value
min_key = key
gap = round((max_value / min_value), 2)
print ('max month is',max_key,'has',max_value,'and min month is',min_key,'has',min_value,'. The gap between min and max months is',gap,'!')
In [68]:
months_diff(month_homicide_rate)
As we can see, there is a link beetween month of year and homicide rate. In June are commited gun-relative homicide in 1